Create Network Version w/o Reassortant Edges

11 February 2015 10:48:AM

I once did a comparison of the networks with and without reassortant edges. The conclusions were that the reassortant edges connect the various subtypes in a global network of gene exchange. Since in the 2nd graph construction run I did not have a step where I arbitrarily cutoff the network at 7.0, but instead did a lowest 10% re-look for edges, the full complement graph will look different from the 1st run, but I think the conclusions remain the same. That said, I will use this notebook to create a version of the Final Graph without reassortant edges.



In [1]:

    
import networkx as nx



In [2]:

    
G = nx.read_gpickle('20141103 All IRD Final Graph.pkl')



In [3]:

    
fullG = G.copy()

for sc, sk, d in fullG.edges(data=True):
    if d['edge_type'] == 'reassortant':
        fullG.remove_edge(sc, sk)
        
nx.write_gpickle(fullG, '20141103 All IRD Full Complement Only Graph.pkl')

Check Subtype Edges

11 February 2015 11:05:AM

Just to make sure that the graph is done correctly, I will make sure that the edges between two nodes have the same HA or NA (reassortant) or both (full complement).



In [4]:

    
# Check full edges
for sc, sk, d in G.edges(data=True):
    if d['edge_type'] == 'full_complement':
        sc_subtype = G.node[sc]['subtype']
        sk_subtype = G.node[sk]['subtype']
        
        mixed = ['mixed', 'Mixed']
        if sc_subtype != sk_subtype and sc_subtype not in mixed and sk_subtype not in mixed:
            print(sc_subtype, sk_subtype, sc, sk) # nothing should be printed.



In [5]:

    
# Define a function to remove all in_edges from a node.
def remove_in_edges(G, node):
    for sc, sk in G.in_edges(node):
        if (sc, sk) in G.edges():
            G.remove_edge(sc, sk)



In [6]:

    
# Check reassortant edges
for sc, sk, d in G.edges(data=True):
    if d['edge_type'] == 'reassortant':
        sc_subtype = G.node[sc]['subtype']
        sk_subtype = G.node[sk]['subtype']
        
        mixed = ['mixed', 'Mixed']
        
        if sc_subtype not in mixed and sk_subtype not in mixed:
            sc_ha = sc_subtype.split('N')[0].split('H')[1]
            sk_ha = sk_subtype.split('N')[0].split('H')[1]
            
            sc_na = sc_subtype.split('N')[1]
            sk_na = sk_subtype.split('N')[1]
            
            if 4 in d['segments'].keys() and sc_ha != sk_ha:
                print(sc, sk, sc_subtype, sk_subtype, d['segments'][4]) # nothing should be printed
                remove_in_edges(G, sk)
                
            if 6 in d['segments'].keys() and sc_na != sk_na:
                print(sc, sk, sc_na, sk_na, d['segments'][6]) # nothing should be printed
                remove_in_edges(G, sk)

Check to see if a particular field is missing in the nodes metadata.



In [7]:

    
for n, d in G.nodes(data=True):
    if 'collection_date' not in d.keys():
        print(n)

Check to make sure that all sink nodes occur after source nodes.



In [8]:

    
counter = 0
for sc, sk, d in G.edges(data=True):
    sc_time = G.node[sc]['collection_date']
    sk_time = G.node[sk]['collection_date']
    
    if sk_time <= sc_time:
        print(sc_time, sk_time)
        counter += 1

Check to make sure that there are no zero-pwi edges



In [9]:

    
for sc, sk, d in G.edges(data=True):
    if d['pwi'] == 0:
        print(sc, sk, d)



In [ ]:



In [ ]:



In [ ]:



In [ ]: